Generating Poisson-Distributed Differentially Private Synthetic Data
نویسندگان
چکیده
The dissemination of synthetic data can be an effective means making information from sensitive publicly available while reducing the risk disclosure associated with releasing directly. While mechanisms exist for synthesizing that satisfy formal privacy guarantees, utility is often afterthought. More recently, use methods disease mapping literature has been proposed to generate spatially-referenced high utility, albeit without guarantees. objective this paper help bridge gap between and literatures. In particular, we extend existing approach generating formally private case Poisson-distributed count in a way allows infusion prior information. To evaluate data, conducted simulation study inspired by available, county-level heart disease-related death counts. results demonstrate differentially outperforms popular technique when counts correspond events arising subgroups unequal population sizes or event rates.
منابع مشابه
Differentially private Bayesian learning on distributed data
Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results, but the proposed algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a dis...
متن کاملDifferentially Private Distributed Data Release for Data Mining
In this paper, we study the privacy threats caused by distributed data sharing and present an algorithm to securely integrate person-specific sensitive data from multiple data owners, whereby the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis.
متن کاملPCPs and the Hardness of Generating Private Synthetic Data
Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm A that takes a database D ∈ ({0, 1}) and outputs a “synthetic database” D̂ all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x ∈ {0, 1} with a given pair of values in a given pair of columns.) This answers...
متن کاملDifferentially Private Distributed Online Learning
Online learning has been in the spotlight from the machine learning society for a long time. To handle massive data in Big Data era, one single learner could never efficiently finish this heavy task. Hence, in this paper, we propose a novel distributed online learning algorithm to solve the problem. Comparing to typical centralized online learner, the distributed learners optimize their own lea...
متن کاملGenerating Differentially Private Datasets Using GANs
In this paper, we present a technique for generating artificial datasets that retain statistical properties of the real data while providing differential privacy guarantees with respect to this data. We include a Gaussian noise layer in the discriminator of a generative adversarial network to make the output and the gradients differentially private with respect to the training data, and then us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Royal Statistical Society
سال: 2021
ISSN: ['0035-9238', '2397-2327']
DOI: https://doi.org/10.1111/rssa.12711